Uncertainty Estimation and Analysis of Categorical Web Data

نویسندگان

  • Davide Ceolin
  • Willem Robert van Hage
  • Wan Fokkink
  • Guus Schreiber
چکیده

Web data often manifest high levels of uncertainty. We focus on categorical Web data and we represent these uncertainty levels as firstor second-order uncertainty. By means of concrete examples, we show how to quantify and handle these uncertainties using the BetaBinomial and the Dirichlet-Multinomial models, as well as how take into account possibly unseen categories in our samples by using the Dirichlet process. We conclude by exemplifying how these higher-order models can be used as a basis for analyzing datasets, once at least part of their uncertainty has been taken into account. We demonstrate how to use the Battacharyya stastistical distance to quantify the similarity between Dirichlet distributions, and use such results to analyze a Web dataset of piracy attacks both visually and automatically.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analysis of Dynamic Longitudinal Categorical Data in Incomplete Contingency Tables Using Capture-Recapture Sampling: A case Study of Semi-Concentrated Doctoral Exam

Abstract. In this paper, dynamic longitudinal categorical data and estimation of their parameters in incomplete contingency tables are evaluated. To apply the proposed method, a study has been conducted on the data of the semi-concentrated doctoral exam of the National Organization for Educational Testing (NOET). The results of studies such as the obtained confidence intervals and calculating t...

متن کامل

Joint Bayesian Stochastic Inversion of Well Logs and Seismic Data for Volumetric Uncertainty Analysis

Here in, an application of a new seismic inversion algorithm in one of Iran’s oilfields is described. Stochastic (geostatistical) seismic inversion, as a complementary method to deterministic inversion, is perceived as contribution combination of geostatistics and seismic inversion algorithm. This method integrates information from different data sources with different scales, as prior informat...

متن کامل

Estimating Uncertainty of Categorical Web Data

Web data often manifest high levels of uncertainty. We focus on categorical Web data and we represent these uncertainty levels as first or second order uncertainty. By means of concrete examples, we show how to quantify and handle these uncertainties using the BetaBinomial and the Dirichlet-Multinomial models, as well as how take into account possibly unseen categories in our samples by using t...

متن کامل

Bayes Interval Estimation on the Parameters of the Weibull Distribution for Complete and Censored Tests

A method for constructing confidence intervals on parameters of a continuous probability distribution is developed in this paper. The objective is to present a model for an uncertainty represented by parameters of a probability density function.  As an application, confidence intervals for the two parameters of the Weibull distribution along with their joint confidence interval are derived. The...

متن کامل

Application of truncated gaussian simulation to ore-waste boundary modeling of Golgohar iron deposit

Truncated Gaussian Simulation (TGS) is a well-known method to generate realizations of the ore domains located in a spatial sequence. In geostatistical framework geological domains are normally utilized for stationary assumption. The ability to measure the uncertainty in the exact locations of the boundaries among different geological units is a common challenge for practitioners. As a simple a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014